GIS Overview

GIS Data

There are two main types of spatial data: vector and raster data.

  • Vector: Vectors (also called shapefiles) consist of points, lines and polygons. These shapes are attached to a dataframe, where each row corresponds to a different spatial element.

  • Raster: Rasters are spatially-referenced grids where each cell has one value.


Shapefile

Raster

Coordinate Reference Systems

  • Coordinate reference systems map pairs of numbers to a location on the earth.
  • Geographic Coordinate Systems live on a sphere; here, the units are in decimal degrees
  • Using the WGS84 coordinate system the World Bank Rwanda is located at -1.951 degrees latitude and 30.061 degrees longitude.
  • Projected Coordinate Systems project the earth onto a flat surface. Doing this distorts the earth in some way (shape, area, distance or direction)
  • Using to the World Mercator projection, the World Bank is located 9783995.88 north and 173093.46 east.

GIS in R System

Here, we'll use the following packages

  • sp: Defines classes and methods for spatial objects.
  • rgdal: For processing spatial data, particularly for reading and writing spatial data.
  • rgeos: A vector processing library.
  • raster: Reading, writing, manipulating, analyzing and modeling of gridded spatial data.
  • ggmap: Load basemaps

Getting started

Getting started

  1. Run your master file to load packages and set folder paths
  2. Open a new script for the code in this session
  3. Load the data for this session
# Loading the data
hh_data <- read.csv(file.path(finalData,"HH_data.csv"))

str(hh_data)
## 'data.frame':    422 obs. of  5 variables:
##  $ X                 : int  1 3 4 5 6 7 8 10 11 12 ...
##  $ expend_food_yearly: num  250452 420029 484468 326109 131487 ...
##  $ food_security     : Factor w/ 3 levels "Little Hunger",..: 1 2 1 1 3 3 1 2 3 1 ...
##  $ longitude_scramble: num  30.4 30.4 30.4 30.4 30.4 ...
##  $ latitude_scramble : num  -1.97 -2 -1.99 -1.98 -1.95 ...

Displaying GPS data in a map

Displaying GPS data in a map

  • In this data set, we have coordinates for the location of each household in our sample
  • We can plot the longitude and latitude in a map as points, the same way we would with any other numeric variable
  • To do so, we'll keep using ggplot:
# Plot household coordinates from data set
ggplot() +
  geom_point(data = hh_data, 
             aes(x = longitude_scramble, 
                 y = latitude_scramble),
             size = .7)

Displaying GPS data in a map

# Plot household coordinates from data set
ggplot() +
  geom_point(data = hh_data, 
             aes(x = longitude_scramble, 
                 y = latitude_scramble),
             size = .7)

Displaying GPS data in a map

  • In a regular plot we may want to adjust the scale to the data
  • In a map, however, it's important to avoid distortion so it can be recognized
  • The option coord_map() creates an approximated projection of an area
  • coord_map() works best for smaller areas closer to the equator
  • A more precise (and less efficient) projection can be created by coord_map()
# Plot undistorted household coordinates from data set
ggplot() +
  geom_point(data = hh_data, 
             aes(x = longitude_scramble, 
                 y = latitude_scramble),
             size = .7) +
  coord_quickmap()

Displaying GPS data in a map

# Plot undistorted household coordinates from data set
ggplot() +
  geom_point(data = hh_data, 
             aes(x = longitude_scramble, 
                 y = latitude_scramble),
             size = .7) +
  coord_quickmap()

Use data in plot aesthetics

  • Because we have other information about the households, we can use the map to illustrate the geographic districution of the variables in the data set

  • This is done using the same arguments as we used for non-spatial data:

# Plot undistorted household coordinates from data set
ggplot() +
  geom_point(data = hh_data, 
             aes(x = longitude_scramble, 
                 y = latitude_scramble,
                 color = food_security),
             size = .7) +
  coord_quickmap()

Use data in plot aesthetics

Customize plot

Now let's make a few adjustments to make it better-looking:

  1. Remove the ticks and axis titles
  2. Show little hunger in green, moderate in yellow and severe in red
  3. Add a title to the map and to the legend
# Launch plot device
ggplot() +
  geom_point(data = hh_data,                   # Add points
             aes(x = longitude_scramble, 
                 y = latitude_scramble, 
                 color = food_security),
             size = .7) +
  coord_quickmap() +                           # Make sure the map isn't distorted
  theme_void() +                               # Remove gray color from background
  scale_color_manual(values = c("green",       # Use more intuitive colors
                                "orange",
                                "red")) +
  labs(title="Household Food Security",        # Add titles
       color="Food Security") 

Customize plot

Add a Basemap

  • Ok, we know we are using spatial data, but this does not look like a map
  • Let's add a basemap on the background
  • To do so, we will use a function called get_map, that loads into R a map from a server such as Google Maps

get_map:

  • location: an address, longitude/latitude pair (in that order), or left/bottom/right/top bounding box
  • zoom: from 3 (continent) to 21 (building)
  • maptype: the map look

Add a Basemap

# Choose a point to center the map on
centroid <-  c(30.46586, -2.014172)


# Get map
basemap <- get_map(location = centroid,
                   zoom = 11,
                   maptype = "hybrid")

Add a Basemap

maptype = "roadmap" maptype = "satellite" maptype = "toner" maptype = "watercolor"

Add a basemap

To use the basemap on the background of our map, we replace the ggplot() expression in the regular graph with ggmap():

ggmap(basemap) +                               # Load basemap 
  geom_point(data = hh_data,                   # Add points
             aes(x = longitude_scramble, 
                 y = latitude_scramble, 
                 color = food_security),
             size = .7) +
  coord_quickmap() +                           # Make sure the map isn't distorted
  theme_void() +                               # Remove gray color from background
  scale_color_manual(values = c("green",       # Set colors so they're more intuitive
                                "orange",
                                "red")) +
  labs(title="Household Food Security",        # Add titles
       color="Food Security") 

Add a basemap

Using Spatial Data Frames

Spatial Data Frames

The data we used until now is a data set like any other, with GPS coordinates as variables. We're using the GPS coordinates to create a plot with the same command we used before. However, that's not how spatial data is more commonly organized.

Spatial data usually takes the format of a shapefile. A shapefile has three basic components:

  1. A shape format, that contains the features' geometry
  2. An attribute, that is, a database with information about each shape
  3. An index that links the feature geometry to the database

Loading a shapefile into R

The get_data function from the raster package allows us to download data from GADM (the Database of Global Administrative Areas). Let's download the first an second administrative division for Rwanda:

# First administrative division
rwanda_l1 <- getData('GADM', country='RWA', level=1)
rwanda_l1
## class       : SpatialPolygonsDataFrame 
## features    : 5 
## extent      : 28.86171, 30.89907, -2.839973, -1.04745  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 13
## names       : OBJECTID, ID_0, ISO, NAME_0, ID_1,           NAME_1, HASC_1, CCN_1, CCA_1,   TYPE_1, ENGTYPE_1, NL_NAME_1,                                                            VARNAME_1 
## min values  :        1,  189, RWA, Rwanda,    1,     Amajyaruguru,  RW.ES,     1,     1, Province,  Province,          ,                                   Eastern Province|Province de l'Est 
## max values  :        5,  189, RWA, Rwanda,    5, Umujyi wa Kigali,  RW.SU,     5,     5, Province,  Province,          , Western Province|Province de l'Ouestern Province|Province de l'Ouest
# Second administrative division
rwanda_l2 <- getData('GADM', country='RWA', level=2)

Spatial Dataframe Structure

In R, a spatial dataframe is a list, where one element of the list is a dataframe of variables and other is a dataframe of coordinates.

Access a dataframe containing the variables (except the coordinates)

# Access the dataframe containing the database
rwanda_l1@data
## 'data.frame':    5 obs. of  13 variables:
##  $ OBJECTID : int  1 2 3 4 5
##  $ ID_0     : int  189 189 189 189 189
##  $ ISO      : chr  "RWA" "RWA" "RWA" "RWA" ...
##  $ NAME_0   : chr  "Rwanda" "Rwanda" "Rwanda" "Rwanda" ...
##  $ ID_1     : int  1 2 3 4 5
##  $ NAME_1   : chr  "Amajyaruguru" "Amajyepfo" "Iburasirazuba" "Iburengerazuba" ...
##  $ HASC_1   : chr  "RW.NO" "RW.SU" "RW.ES" "RW.OU" ...
##  $ CCN_1    : int  4 2 5 3 1
##  $ CCA_1    : chr  "4" "2" "5" "3" ...
##  $ TYPE_1   : chr  "Province" "Province" "Province" "Province" ...
##  $ ENGTYPE_1: chr  "Province" "Province" "Province" "Province" ...
##  $ NL_NAME_1: chr  "" "" "" "" ...
##  $ VARNAME_1: chr  "Northern Province|Province du Nord" "Southern Province|Province du Sud" "Eastern Province|Province de l'Est" "Western Province|Province de l'Ouestern Province|Province de l'Ouest" ...

Quickly Plot Spatial Data

We can quickly plot spatial dataframes using base plot. Note that now the map is not distorted even though we didn't tell it to project the data.

# Quick plot of Rwanda's first administrative division
plot(rwanda_l1)

Quickly Plot Spatial Data

We can quickly plot spatial dataframes.

# Quick plot of Rwanda's first and second administrative divisions
plot(rwanda_l1, lwd = 2)
plot(rwanda_l2, add = TRUE)

Making nice-looking maps

As discussed earlier, base plot is useful to create quick visualizations, but not so much for nice presentations.

However, ggplot can't interpret spatial dataframes. Consequently, we need to transform the data into a format that ggplot can understand. Here, we make a dataframe where each vertice of the polygon is an observation.

# Transform the sshapefile into a data set
rwanda_l1_df <- tidy(rwanda_l1, region = "ID_1")
rwanda_l2_df <- tidy(rwanda_l2, region = "ID_2")
head(rwanda_l2_df)
##       long       lat order  hole piece group id
## 1 29.82406 -1.308745     1 FALSE     1   1.1  1
## 2 29.82414 -1.308770     2 FALSE     1   1.1  1
## 3 29.82423 -1.308752     3 FALSE     1   1.1  1
## 4 29.82433 -1.308763     4 FALSE     1   1.1  1
## 5 29.82441 -1.308798     5 FALSE     1   1.1  1
## 6 29.82449 -1.308845     6 FALSE     1   1.1  1

Making nice-looking maps

The resulting dataframe from tidy:

  • Has the variable id, which is taken from the the variable in the region argument.
  • Doesn't have any of our other variables (e.g., food security).
  • We can merge our other variables from the original database into this dataframe using the correspondence in ID variables
  # Merging spatial data and original data set
  rwanda_l1_df <- merge(rwanda_l1_df, rwanda_l1, 
                        by.x="id", by.y="ID_1")
  
  rwanda_l2_df <- merge(rwanda_l2_df, rwanda_l2, 
                        by.x="id", by.y="ID_2")
##   id     long       lat order  hole piece ISO ID_1       NAME_1
## 1  1 29.82406 -1.308745     1 FALSE     1 RWA    1 Amajyaruguru
## 2  1 29.82414 -1.308770     2 FALSE     1 RWA    1 Amajyaruguru
## 3  1 29.82423 -1.308752     3 FALSE     1 RWA    1 Amajyaruguru
## 4  1 29.82433 -1.308763     4 FALSE     1 RWA    1 Amajyaruguru
## 5  1 29.82441 -1.308798     5 FALSE     1 RWA    1 Amajyaruguru
## 6  1 29.82449 -1.308845     6 FALSE     1 RWA    1 Amajyaruguru

Making nice-looking maps

Now, let's make a map

# Map first administrative division
ggplot() + 
  geom_polygon(data = rwanda_l1_df, 
               aes(x = long, 
                   y = lat,
                   group = group),
               colour = "black")    # Set color of line

Making nice-looking maps

Now, let's make a map

Making nice-looking maps

  • The argument alpha determines the transparency of the map's fill:
ggplot() + 
  geom_polygon(data = rwanda_l1_df,   ## Map first administrative division
               aes(x = long, 
                   y = lat,
                   group = group),
               colour = "black",       # Set color of line
               alpha = 0)              # Set transparency of fill

Making nice-looking maps

  • The argument alpha determines the transparence of the map's fill:

Using two shapefiles in a single map

To make it more interesting, let's also show the second administrative division. We do this by adding a second shapefile to the same plot:

ggplot() +                             ## Open graph device
  geom_polygon(data = rwanda_l2_df,    ## Map second administrative division
               aes(x = long, 
                   y = lat,
                   group = group, 
                   fill = id),          # One color fill per id
               alpha = .3) +            # Set transparence of fill
  geom_polygon(data = rwanda_l1_df,    ## Map first administrative division
               aes(x = long, 
                   y = lat,
                   group = group),
               colour = "black",         # Set color of line
               alpha = 0)                # Set transparence of fill

Using two shapefiles in a single map

Adding a basemap

Now let's add another basemap

# Create a bounding box around Rwanda
area <- bbox(rwanda_l1)                 # bbox() only works with spatial data

# Get the basemap from Google Maps
basemap <- get_map(location = area,
                   zoom = 8,
                   maptype = "satellite")

# Create the map
ggmap(basemap) + 
  geom_polygon(data = rwanda_l2_df,    ## Map second administrative division
               aes(x = long, 
                   y = lat,
                   group = group, 
                   fill = id),          # One color fill per id
               alpha = .3) +            # Set transparence of fill
  geom_polygon(data = rwanda_l1_df,    ## Map first administrative division
               aes(x = long, 
                   y = lat,
                   group = group),
               colour = "black",         # Set color of line
               alpha = 0)                # Set transparence of fill

Adding a basemap

Final adjustments

Some final touches to make it prettier:

ggmap(basemap) +                         ## Add basemap
  geom_polygon(data = rwanda_l2_df,      ## Map second administrative division
               aes(x = long,
                   y = lat,
                   group = group,
                   fill = id),            # One color fill per id
               alpha = .2) +              # Set transparence of fill
  geom_polygon(data = rwanda_l1_df,      ## Map first administrative division
               aes(x = long,
                   y = lat,
                   group = group),
               colour = "black",          # Set color of line
               alpha = 0) +               # Set transparence of fill
  coord_fixed(xlim = c(28.8, 31),         # Crop map to remove unnecessary borders
              ylim = c(-3,-1)) +
  theme_void() +                          # Remove ticks
  theme(legend.position = "none")         # Remove legend

Final adjustments